U.S.  DEPARTMENT  OF  COMMERCE 
NstiiMl  TtcMcM  liMfKitNiR  Smict 


AD-A032  348 


Rank  Degeneracy  and  Least 
Squares  Problems 

Stanford  Univ  Calif  Dept  of  Computer  Science 


Aug  76 


V. 


wWRKmmM 

1.  Renan  No.  It 

STAN-C3-70-550  | 

X Recipient'*  Acccaaion  No. 

4.  Title  and  Subtitle 

1 II 

RANK  DEGENERACY  AHI  LEAST  SQUARES  PRQbLEMS 

«. 

7.  Authot(s) 

Gene  Golub,  Virginia  Klema,  G.  V.'.  Stewart 

X Performing  Organisation  Rep. 

N*.  s?AN-CS-76-55c» 

f.  I’rtfiHMiltf  OlKMiuiiM  N.W  jnJ  Adder  *% 

STANFORD  UNIVERSITY 

If.  Ptoject/Tnak/Wotk  Unit  No. 

COMPUTER  SCIENCE  DEPARTMENT 
STANFORD,  CA.  9^30$ 

11.  Contract /Grant  No. 

NCC01L-7O-C-0391 

IX  Sponaotinn  Organization  Nmh  and  Address 

IX  Type  of  Report  t Period 

Office  of  Naval  Research 
Department  of  the  Navy 
Arlington,  Virginia  22217 

Technical 

14. 

IS.  Supple  nest  sty  Notes 

114*  Abftt.acts 

This  paper  is  concerned  with  least  squares  problems  when  the  least  squares  matrix 
A is  near  a matrix  that  is  not  of  full  rank.  A definition  of  numerical  rank 
is  given.  It  is  shown  that  under  certain  conditions  when  A has  numerical  rank 
r there  is  a distinguished  r dimensional  subspace  of  the  column  space  of  A 
that  is  insensitive  to  how  it  is  approximated  by  r independent  columns  of  A. 
The  consequences  of  this  fact  for  the  least  squares  problem  are  examined. 
Algorithms  are  described  for  approximating  the  stable  part  of  the  column  space 
of  A. 

17*  Key  tWJ»  **J  Docwrpm  Amlyxiii.  17*  Descriptor* 

ITfc.  Identifier* /Open-Ended  Tenai 

17c.  f.OSATI  FieldAWoop 

• 

/ 

Approved  for  public  release; distribution  unlimited. 


rom*  NTt»-»t  na-rai 


Report) 


eurity  < 


T£Tii&  " 


IS**  • HVC 

yjhce  •~3‘cC 


uteoMM-oc 


ADA082348 


1 


Rank  Degeneracy  and  Least  Squares  Problems# 

Gene  Golub4 
Stanford  University 
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G.  W.  Stewart* 

University  of  Maryland 


Abstract 


This  paper  is  concerned  with  least  squares  problems  when  the  least 
squares  matrix  A is  near  a matrix  that  is  not  of  full  rank. 

A definition  of  numerical  rank  is  given.  It  is  shown  that  under 
certain  conditions  when  A has  numerical  rank  r there  is  a dis- 
tinguished r dimensional  subspace  of  the  column  space  of  A 
that  is  insensitive  to  how  it  is  approximated  by  r independent 
columns  of  A.  The  consequences  of  this  fact  for  the  least  squares 
problem  are  examined.  Algorithms  are  described  for  approximating 
the  stable  part  of  the  colum  space  of  A. 
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1.  Introduction 

In  this  paper  we  shall  he  concerned  with  the  following  problem. 

Let  A be  an  m * n matrix  with  m > n,  and  suppose  that  A is  near  (in 
a sense  to  be  made  precise  later)  a matrix  B whose  rank  is  less  than 
n.  Can  one  find  a set  of  linearly  independent  coltans  of  A that  span 
a good  approximat:'Mi  to  the  colirnn  space  of  B? 

The  solution  of  this  problem  is  isqwrtant  in  a nuaber  of  applica- 
tions. In  this  paper  we  shall  be  chiefly  interested  in  the  case  where 
the  coltans  of  A represent  factors  or  carriers  in  a linear  model 
which  is  to  be  fit  to  a vector  of  observations  b.  In  some  such  applica- 
tions, where  the  elements  of  A can  be  specified  exactly  (e.g.  the 
analysis  of  variance) , the  presence  of  rank  degeneracy  in  A can  be 
dealt  with  by  explicit  mathematical  formulas  and  causes  no  essential 
difficulties.  In  other  applications,  however,  the  presence  of  degeneracy 
is  not  at  all  obvious,  and  the  failure  to  detect  it  can  result  in  meaning- 
less results  or  even  the  catastrophic  failure  of  the  numerical  algorithms 
being  used  to  solve  the  problem. 

The  organization  of  this  paper  is  the  following.  In  the  next  sec- 
tion we  shall  give  a precise  definition  of  approximate  degeneracy  in  terns 
of  the  singular  value  decomposition  of  A.  In  Section  3 we  shall  show 
that  under  certain  conditions  there  is  associated  with  A a subspacc 
that  is  insensitive  to  how  it  is  approximated  by  various  choices  of  the 
colurais  of  A,  and  in  Section  4 we  shall  apply  this  result  to  the  solution 
of  the  least  squares  problem.  Sections  5,  6,  and  7 will  be  concerned  with 
algorithms  for  selecting  a basis  for  the  stable  subspace  from  among  the 
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co limns  of  A. 

The  ideas  underlying  our  approach  are  by  no  means  new.  We  use  the 

singular  values  of  the  matrix  A to  detect  degeneracy  and  the  singular 

vectors  of  A to  rectify  it.  The  squares  of  the  singular  values  are 

the  eigenvalues  of  the  correlation  matrix  A*A,  and  the  right 

T 

singular  yectors  are  the  eigenvectors  of  A A,  that  is  the  principal 
components  of  the  problem.  The  use  of  principal  components  to  eliminate 
colincaritics  has  been  proposed  in  the  literature  (e.g.  see  [4,9,I6,17J). 
This  (taper  extends  these  proposals  in  two  ways.  First  we  prove  theorems 
that  express  quantitatively  the  results  of  deciding  that  certain  columns 
of  A can  be  ignored.  Second  we  describe  in  detail  how  existing  compu- 
tational techniques  can  be  used  to  realize  our  methods. 

A word  on  notation  is  appropriate  here.  We  have  assumed  a linear 
model  of  the  form  b * Ax  ♦ e,  where  b is  an  m-vcctor  of  observations 
and  x is  an  n-vector  of  parameters.  This  is  in  contrast  to  the  usual 
statistical  notation  iv.  which  the  model  is  written  in  the  form  y * xp  ♦ e, 
where  y is  an  r.-vector  of  observat  50ns  and  p is  a p-vector  of  parameters. 
T)-c  reason  for  this  is  that  we  wish  to  draw  on  a body  of  theorems  and 
algorithms  from  numerical  linear  algebra  that  have  traditionally  been 
couched  in  the  first  notation.  We  feel  that  this  dichotomy  in  notation  be- 
tween statisticians  and  numerical  analysts  has  hindered  communication 
between  the  two  groups.  Perhaps  a partial  solution  to  this  problem  is  the 
occasional  appearance  of  notation  from  nuwsrical  analysis  in  statistical 
journals  and  vice  versa,  so  that  each  group  may  have  a chance  to  learn  the 
other's  notation. 


W 


Throughout  this  paper  we  shall  use  two  norms,  ihc  first  is  the 


Fuclhiean  vector  norm  defined  for  an  n-vector  x by 

2 *1  ? 

?!x!i;  * l xr 
- i=l  1 

and  its  subordinate  matrix  norm  defined  by 


IjAli,  * sup  HAxI,,. 

».x||2-l 

Tlic  second  is  the  Frobenius  matrix  norm  defined  for  the  m r>  n matrix  A bv 


•s  w n s 

9All|*  l [a r 

f i“l  j«l  13 

Both  these  norms  are  consistent  in  the  sense  that 


HABIIp  < !!AiM|B||p  (p  - 2,F) 

whenever  the  product  AB  is  defined.  They  are  also  unitarily  invariant; 
that  is  if  U and  V are  orthogonal  matrices  then 

8AHp  - ;!UTA|Ip  » HAV||p  (p  * 2,F). 

lor  more  on  these  matrix  norms  see  [14J. 

2.  Rank  Degeneracy 

The  usual  mathematical  notion  of  rank  is  not  very  useful  wl»en  the 
matrices  in  question  are  not  known  exactly.  For  example,  suppose  that  A 
is  an  m * n matrix  that  was  originally  of  rank  r < n but  whose  elements 
have  been  perturbed  by  some  small  errors  (e.g.  rounding  or  measurement 
errors).  It  is  extremely  unlikely  that  these  errors  will  conspire  to 
keep  the  rank  of  A exactly  equal  to  r;  indeed  what  is  most  likely  is 
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that  the  perturbed  matrix  will  have  lull  rank  n.  Nonetheless,  tin' 
nearness  of  A to  a matrix  of  defective  rank  will  often  cause  it  to 
behave  erratically  when  it  is  subjected  to  statistical  and  numerical 
algorithms. 

One  way  of  circumventing  the  difficulties  of  the  mathematical 
definition  of  rank  is  to  specify  a tolerance  and  say  that  A is  numeri- 
cally defective  in  rank  if  to  within  that  tolerance  it  is  near  a defective 
matrix.  Specifically  we  might  say  that  A has  e-rank  r with  respect 
to  the  norm  (Ml  if 


(2.1)  r » inf  {rank(B) : |(A-B!|  < e}. 

However,  this  definition  has  the  defect  that  a slight  increase  in  r 
can  decrease  the  numerical  rank.  What  is  needed  is  an  upper  bound  on 
the  values  of  e for  which  the  numerical  rank  remains  at  least  equal  to 
r.  Such  a number  is  provided  by  any  number  6 satisfying 

(2.2)  s < 6 < sup  {rj:  ||A-B(i  < n - rank(B)  > r). 

Accordingly  wc  make  the  following  definition. 

Definition  2.1.  A matrix  A has  numerical  rank  (6,e,r)  with 
respco  to  the  norm  li *11  if  6,e,  and  r satisfy  (2.2)  and  (2.2). 

When  the  norm  in  definition  2.1  is  either  the  2-norm  or  the  Frobenius 
norm,  the  problem  of  determining  the  numerical  rank  of  a matrix  can  be 
solved  in  terms  of  the  singular  value  decomposition  of  the  matrix.  This 
decors  it  ion,  which  has  many  applications  (c.g.  see  (7]),  is  described  in 
the  following  theorem. 


Theorem  2.2.  Let  A be  an  m « n matrix  with  m • n.  Then  there 


is  an  orthogonal  matrix  U or  order  m and  an  orthogonal  matrix  V 
of  order  n such  that 

(2.3)  UTAV  = 

where 


and 


2 * diagfoj*^*  • • • » Ojj) 


a.  >a,  > •••  > or  > 0. 

1 £ n 

For  proofs  of  this  theorem  and  the  results  cited  below  see  [14] . The 
(umbers  o1to7,...,c  , which  are  unique,  arc  called  the  singular  values 
of  A.  The  coltmns  Uj,u2,...,um  of  U are  called  the  left  singular 
vectors  of  A,  and  the  columns  vi»v2»***»vn  arc  called  the  right  singular 
vectors  of  A.  The  matrix  A has  rank  r if  and  only  if 


(2.4) 


Cl  > 0 3 

r 


°r*l 


9 


in  which  case  the  vectors  Uj,u2,...,ur  form  an  orthonormal  basis  for  the 
column  space  of  A (hereafter  denoted  by  R(A)). 

It  is  the  intimate  relation  of  the  singular  values  of  a matrix  to 
its  spectral  and  Frooenius  norms  that  enables  us  to  characterize  numeri- 
cal lank  in  terms  of  singular  values.  Specifically  the  spectral  norm  of 
A is  given  by  the  expression. 


- 6 - 


Moreover,  if  tj  • i7  • ...  • are  the  singular  values  of  B - A ♦ I:,  then 

1*3-  i . | v IlHfi 2 (i  » l,2,...,n). 

In  view  of  (2.4)  this  implies  that 


(2.5) 


inf  !!A-B!|-  * o 
rank(B)<r  £ ^ 


and  this  infinuo  is  actually  attained  for  the  Matrix  B defined  by 
(2.6)  B 


■ (:')'• 


where  X*  - diag(Oj,o2,...tort0f...,0). 
Likewise 


II All p « o\  + o\  + ...  ♦ o~. 


and 


inf  liA-B!lp  * ♦ ...  ♦ a" 

rank(B)<r  r n 

The  infiMUR  is  attained  for  the  matrix  B defined  by  (2.6). 

Using  these  facts  we  can  characterise  the  notion  of  n«erical  rank. 

In  the  following  theorem  we  use  the  notation  rank  (6fe,r)^  to  mean  numeri- 
cal rank  with  respect  to  the  norm  !Mf  . 

Theorem  2.3.  Let  Oj  2 o2  > . ..  S on  be  the  singular  values  of  A„ 
Then  A has  m*»crical  rank  (f>,c»r)7  if  and  only  if 


oT>  6 >',>  ar+1. 


(2.7) 
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Also  A has  miner ical  rank  (5,  e,r)p  if  and  only  if 

2 2 2 2 2 2 1 

°v  * Vl  + *“  * cn  " 6 > c - °r*l  * •**  + V 

Proof.  We  prove  the  result  for  the  spectral  norm,  the  proof  for 
the  Frobenius  norm  being  similar.  First  suppose  that  (2.7)  holds.  Then 
by  (2.5)  if  i|B-A!|,  < 6 we  must  have  rank  (B)  > r.  Consequently  .5  satis- 
fics  (2.2).  This  also  shows  that 


min  (rank(B):  liB-Ajl  < c)  > r. 

But  the  matrix  B of  (2.6)  is  of  rank  r and  satisfies  s c. 

Hence  t satisfies  (2.1). 

Conversely,  suppose  6,e,  and  r satisfy  (2.1)  and  (2.2).  Then 
by  (2.5),  5 s or.  Also  e > c^;  for  if  not  by  (2.1)  there  is  a matrix 
B of  rank  r satisfying  !!A-Bj|  < or+1,  which  contradicts  (2.5).a 
Because  of  the  suif>licity  of  the  characterization  (2.7)  we  shall 
restrict  ourselves  to  rank  defectiveness  measured  in  terms  of  the  spectral 


norm. 


We  shall  need  two  other  facts  about  singular  values  in  the  sequel. 


First  define 


(2.8) 


inf  (A)  » inf  :,  Axi|?. 
Iixl!,*l 


inf (A)  ■ a 


where  on  is  the  smallest  singular  value  of  A.  Second,  let  X and  Y 
he  any  matrices  with  orthononral  colunns  and  let  Tj  > t,  > ...  > be 
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the  singular  values  of  C = X AY.  Then 

(2.9)  o.  ? t.  (i  = 1,2,. ..,k) 
and 

(2.10)  *tjt_ i+j  - Ci  “ l»2,...,k). 

3.  Die  e-Section  of  R(A) 

Having  confirmed  that  a matrix  A has  numerical  rank  (5,e,r)2 
with  r < n,  one  must  decide  what  to  do  about  it.  If  the  singular  value 
decomposition  has  been  computed  as  a preliminary  to  determining  the 
numerical  rank,  one  solution  naturally  presents  itself.  This  is  to  work 
with  the  matrix  B defined  by  (2.6).  Because  B has  an  explicit  repre- 
sentation  in  terns  of  s’,  the  usual  difficulties  associated  with  zero  singular 
values  can  be  avoided.  Moreover,  the  solution  so  obtained  is  the  exact 
solution  of  a small  perturbation  of  A. 

However,  this  solution  has  the  important  defect  that  it  does  not 
reduce  the  size  of  the  problem.  For  example,  if  the  problem  at  hand  is 
to  approximate  a vector  of  observations  b,  the  procedure  sketched  above 
will  express  the  approximation  as  a linear  combination  of  all  the  columns  of 
A,  even  though  some  of  them  are  clearly  redundant.  What  is  needed  is  a 
device  for  selecting  a set  of  r linearly  independent  colunns  of  A. 

In  Sections  5 and  6 we  shall  discuss  nuncrical  techniques  for  actually 
making  such  a selection.  In  this  section  and  the  next  we  shall  concern 


ourselves  with  the  question  of  when  making  such  a selection  is  sensible. 
The  main  difficulty  is  that  there  are  many  different  sets  of  r 
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linearly  independent  columns  of  the  matrix  A,  and  not  all  these 
sets  may  be  suitable  for  the  problem  at  hand.  For  example,  if  the 
problem  is  again  that  of  approximating  a vector  of  observations  b, 
then  for  each  set  of  columns  we  shall  attempt  to  find  a vector  in 
the  subspace  spanned  by  the  columns  that  is  in  some  sense  a best 
approximation  to  b.  Now  if  the  subspace  determined  by  a set  varies 
widely  from  set  to  set,  then  our  approximation  to  b will  not  be  sta- 
ble. Therefore,  we  turn  to  the  problem  of  determining  when  these 
subspaces  are  stable. 

We  shall  attack  the  problem  by  comparing  the  subspaces  with  a 
particular  subspace  that  is  determined  by  the  singular  value  decomposition. 
Let  A have  nunerical  rank  (6,e,r).  Let  the  matrix  U in  (2.3)  be 
partitioned  in  the  form 

U = (Ue,Uf), 

where  Ur  has  the  r columns  u^u^,... ,ur>  Then  we  shall  call  R(Uf ) 
the  e- sect ion  of  R(A).  Note  that  the  e-section  of  R(A)  is  precisely  the 
column  space  of  the  matrix  B defined  in  (2.6). 

We  shall  compare  subspaces  in  terms  of  the  difference  of  the  ortho- 
gonal projections  upon  them.  Specifically  for  any  matrix  X let  P 
denote  the  orthogonal  projection  onto  R(X).  Then  for  two  subspaces  R (X) 
and  R(Y)  we  shall  measure  the  distance  between  them  by  !IPj.-Py!j?  (for 
the  various  geometric  interpretations  of  this  number,  which  is  related 
to  canonical  correlations  and  the  angle  between  subspaccs,  see  [1,2,13]). 

It  is  known  that  if  Y has  orthonormal  columns  and  $ has  orthonormal 
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columns  spanning  the  orthogonal  complement  of  R(X),  then 


(3.1) 


IIPx-pY!|2  = llSTY|!2- 


The  selection  of  r columns  a.  ,a.  ,...,a.  from  the  matrix 

*r 


Y x2 


A = (a^a^..., an)  has  the  following  matrix  interpretation.  Let  W 

be  the  n * r matrix  formed  by  taking  columns  ij,i2,...,ir  from  the  n x n 

identity  matrix.  Then  it  is  easily  verified  that  (a.  ,a.  ,...,a.  ) = AW. 

T ll  l2  *r 

Of  course  W*W  »I,  so  that  V.*  has  orthonormal  columns,  and  this  is  all 

that  is  needed  for  the  following  comparison  theorem. 


Theorem  3.1.  Let  A have  numerical  rank  (6,e,r)2  and  let  Uf 
be  defined  as  above.  Let  W be  an  n * r matrix  with  orthonormal  colunns 
and  suppose  that 


(3.2) 


r * inf (AW)  > 0, 


where  inf(X)  is  defined  by  (2.8).  Then 

(3.3)  ||Py  -Pwll2  s 

c 

T T 

Proof.  The  matrix  W A AW  is  positive  definite  and  hence  has  a 

T T -1/2 

nonsingular  positive  definite  square  root.  Set  Y = AW(W  A AW)  . It 
is  easily  verified  that  Y has  orthonormal  columns  spanning  R(AW). 
Moreover,  from  (5.2) 

(3.4)  |!(WTATAH’)”1/2||2  « y'1. 

The  matrix  Uf  also  has  orthonormal  colunns,  and  they  span  the  orthogonal 
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complement  of  R(Uf).  It  follows  from  (2.3)  that 


(3.£) 


l|ll|A|i2  < e. 


Hence  from  (3,1),  (3.4),  and  (3.5) 


|PU  -PAK!i2  = lK5>(KW)'1/2li2 

s !!U1Aii0|!Wi|,!|(KTATW)"1/2l!2 


i e/y.u 


Theorem  3.1  has  the  following  interpretation.  The  nunbcr  y measures 
the  linear  independence  of  the  columns  of  AW.  If  it  is  small  compared  to 
Ah'  then  the  columns  of  AW  themselves  must  be  nearly  dependent.  Thus 
Theorem  3.1  says  that  if  we  can  isolate  a set  of  r colunns  of  A that 


approximation  to  the  e-section  R(Uf). 

However,  there  are  limits  to  how  far  we  can  go  with  this  process. 

By  (2.8)  the  number  y satisfies  or  > v,  and  by  the  definition  of  mmeri- 
cal  rank  e > or+1.  Consequently,  the  best  ration  we  can  obtain  in  (3.3)  is 
°r+l^°r*  ^1US  t^H?  t^eo^e,,,  001  very  meaningful  unless  there  is  a veil 

dofmed  gap  between  or+j  and  o.(*  One  cure  for  this  problem  is  to  in- 
crease r in  an  attempt  to  find  a gap;  however,  such  a gap  need  not  exist 
(e.g.  suppose  * Oj/2  (i  * l,2,...,n-l)).  What  to  do  when  the  matrix 
A exhibits  a gradual  rather  than  a precipitous  decline  into  degeneracy 
is  a difficult  problem,  whose  solution  must  almost  certainly  depend  on 
additional  information. 
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A second  difficulty  is  that  it  may  be  impossible  to  obtain  the 
ideal  ratio  because  in  practice  we  must  restrict  our  choice  of  W to 
colunns  of  the  identity  matrix;  i.e.  we  must  choose  from  among  colunns 
of  A.  That  this  is  a real  possibility  is  shown  by  the  following  example. 


Example  3.2.  Let  e^  denote  the  vector  (1,1,..., 1)T  with  n 
components.  The  matrix 


A * I 
n n 


n 


has  singular  values  1,1,. ..,1,0,  so  that  it  has  numerical  rank  (l,0,n-l)2. 
Thus  we  should  like  to  remove  a single  coltmn  of  An  to  obtain  an  approxi- 
mation to  the  0-section  of  A.  Owing  to  synmetry,  it  does  not  matter  which 
column  we  remove.  If  we  remove  the  last  one,  the  resulting  matrix  A^ 
has  the  form 


a; 


n 


e(n)e(n-l)T 
n ~ 


where  En  consists  of  the  first  n-1  columns  of  the  identity  matrix. 
Thus 


from  which  it  follows  that 
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= _L 

2 
and 

l 

Y * inf  (A£)  < \ . 

m 

' -1/2 

It  should  be  observed  that  the  factor  n exhibited  in  the 
example  is  not  extremely  small.  For  n = 25  it  is  only  1/5.  Unfortunately 
no  lower  bound  on  y is  known,  although  with  the  computational  algorithms 
to  be  described  in  Sections  5 and  6 it  is  easy  enough  to  check  the  com- 
puted value. 

A final  problem  associated  with  Theorem  3.1  is  that  it  is  not 
invariant  under  scaling.  By  scaling  we  mean  the  multiplicative  scaling 
of  rows  and  columns  of  A and  not  additive  scaling  such  as  the  subtrac- 
tion of  means  or  a time  factor  from  the  columis  of  A (this  latter 
scaling  can  be  handled  by  including  the  factors  explicitly  in  the  model) . 
Since  by  multiplying  a colum  by  a sufficiently  small  constant  one  can 
produce  as  small  a singular  value  as  one  desires  without  essentially  alter- 
ing the  model.  Theorem  3.1  can  be  coaxed  into  detecting  degeneracies  that 
are  not  really  there.  This  means  that  one  mus*  look  outside  the  hypo- 
theses of  Theorem  3.1  for  a natural  scaling.  While  we  are  suspicious  of 
pat  scaling  strategies,  we  think  that  the  following  criterion  is  reason- 
able for  many  applications.  Specifically,  the  rows  and  coiunns  of  A 
should  be  scaled  so  that  the  errors  in  the  individual  elements  of  A are 


A’ 

n 


.(n-l) 


as  nearly  as  possible  equal.  This  scaling  has  also  been  proposed  in 

[4] ,  and  an  efficient  algorithm  for  accomplishing  it  is  described  in 

[5] . 

The  rationale  for  this  scaling  is  the  following,  /ram  the  defini- 
tion of  the  singular  value  decomposition  it  follows  that 

AVi  * oiui  (i  * l,2,..,n). 

Now  if  we  imagine  that  our  matrix  is  in  error  and  that  our  true  matrix 
is  A + E,  then 

(3.6)  (A+E)v.  * o.u.  + Ev. . 

If  we  have  balanced  our  matrix  as  suggested  above,  then  all  of  the  elements 
of  E are  roughly  the  same  size,  and  llEVjl^  s l|E||2.  Thus  if  (c^j  < ||E!!7, 
equation  (3.6)  says  that  up  to  error  v.  is  a null  vector  of  A + E,  and 
the  matrix  is  degenerate. 

We  recognize  that  this  scaling  criterion  raises  as  many  questions  as 
it  answers.  An  important  one  is  what  to  do  when  such  scaling  cannot  be 
achieved.  Another  question  is  raised  by  the  observation  that  in  regres- 
sion row  scaling  is  equivalent  to  weighting  observations,  which  amounts 
to  changing  the  model.*  Is  this  justified  simply  to  make  Theorem  3.1 
meaningful?  Although  this  question  has  no  easy  answer,  we  should  like  to 
jwint  out  that  it  nay  be  appropriate  to  use  one  scaling  to  eliminate 

colinearities  in  A and  another  for  subsequent  regressions, 

w 

We  are  indebted  to  John  Chambers  and  Roy  Welsh  for  pointing  this  out. 
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In  the  next  section  we  are  going  to  examine  the  implications  or 
Theorem  3.1  for  the  linear  least  squares  problem  in  which  a vector  of 
observations  b is  optimally  approximated  in  the  2-norm  by  linear  combina- 
tions of  the  columns  of  A: 


b a?  Ax. 

% 

In  some  applications  the  2-nom  is  not  the  best  possible  choice,  and  one 
may  wish  to  minimize  ip(b-Ax),  where  <p  is  a function  that  nay  not  even 
be  a norm.  For  exaaple,  in  robust  regression  one  approach  is  to  minimize 
a function  that  may  reduce  the  influence  of  wild  points.  Me  shall  not  pur- 
sue this  subject  here;  but  we  believe  that  Theorem  3.1  has  important  impli- 
cations for  these  problems.  Namely,  if  we  are  searching  for  an  approxi- 
mation to  b in  R(A),  we  cannot  expect  the  solution  to  be  well  determined 
unless  R(A)  itself  is.  Theorem  3.1  provides  a theoretical  basis  for 
finding  stable  subspaces  of  R(A);  however,  specific  theorems  must  wait 
the  development  of  a good  perturbation  theory  for  approximation  in  norms 
other  than  the  2-norm. 

4.  The  Linear  Least  Squares  Problem 

In  this  section  we  shall  consider  the  linear  least  squares  problem 

2 

(4,1)  minimize  llb-Ax  |l2. 

It  is  well  known  that  this  problem  always  has  a solution,  which  is  unique 
if  and  only  if  A is  of  full  colimm  rank.  At  the  solution,  tho  residual 
vector 


r * b - Ax 
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is  the  projection  of  b onto  the  orthogonal  complement  of  R(A). 

When  A has  numerical  rank  (5,c,r)2»  the  solution  to  (4.1)  may 
be  large*  and  some  of  the  individual  components  of  the  solution  will 

i 

certainly  have  large  variances.  If  the  ratio  e/6  is  sufficient];/ 
small  a stable  solution  can  be  computed  by  restricting  oneself  to  the 
e-section  of  A.  Computationally  this  can  be  done*  as  follows.  Define 
U and  U as  in  Section  3,  and  further  define 

vc  * <vv2 V>  * (Vl’-'V 

and 


zc  * diag(oj,c*2»» • • »or)» 


Then  the  matrix  B of  (2.6)  is  given  by 


B-UZV*. 
c e e 


Moreover  the  vector 


x,  - V Z*Vb 

c e c e 


is  the  unique  solution  of  the  problem  of  minimizing 


lib-Bxlj. 


that  is  of  minimum  2-nora.  It  is  easily  seen  that 


r * b - Ax  * b - Bx  . 

c e £ 
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As  wc  indicated  in  the  last  section,  this  solution  is  not  entirely 
satisfactory,  sir-  e it  involves  all  the  colinns  of  A,  whereas  we  might 
!x>pe  to  obtain  a satisfactory  representation  of  b in  terms  of  r 
suitably  chosen  colinns;  that  is  with  a model  having  only  r carriers. 
It  is  a consequence  of  Theorem  3.1  that  any  set  of  r reasonably  inde- 
pendent colinns  will  do,  although  in  practice  additional  considerations 
may  make  some  choices  preferable  to  others. 

Tlieorem  4.1.  Assisting  the  notation  and  hypothesis  of  Theorem  3.1, 
let  x,  and  r be  defined  as  above.  Let  yw  be  the  solution  of  the 

t C R 

linear  least  squares  problem 


minimize  !|b-AMy||. 


and  let  be  the  residual 


* b - AWy^. 


Then 


,VtWl2  _ , 

~ rarr 5 e/r- 


Proof.  By  the  proj-erties  of  the  least  squares  residual 

rt  * )b  and  tjj  * (I-P^)b.  Hence 

e 

llrc-  r^'^  a ii  (Py  “Pyy|j) b|| 2 - “ !ib)f 2*  cs 
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that  one  such  set  will  not  fit  b better  than  another?  The  answer  is 
that  if  there  is  a well  defined  gap  between  6 and  c,  then  any  set  of 
r strongly  independent  colmns  will  give  approximately  the  same  resi- 
dual. However,  there  remains  the  possibility  that  by  including  more 
columns  of  A a considerably  smaller  residual  could  be  obtained.  We 
stress  that  such  a solution  cannot  be  very  stable.  By  (2.8)  any  matrix 
consisting  of  more  than  r columns  of  A must  have  a singular  value 
less  than  or  equal  to  c,  and  it  follows  from  the  perturbation  theory 
for  the  least  squares  problem  (15)  that  the  solution  must  be  sensitive 

to  perturbations  in  A and  b.  (Another  way  of  seeing  this  is  to  note 
-2  T -1 

that  c is  a lower  bound  for  II (A  A)  ll2,  so  that  the  solution  must  have 
a large  covariance  matrix.) 

However,  one  might  be  willing  to  put  up  with  the  instabilities  in  the 
solution  provided  it  gives  a good  approximation  to  b.  We  shall  now  show 

that  any  solution  that  substantially  reduces  the  residual  over  r is  not 

c 

only  unstable,  it  is  also  large. 


Theorem  4.2.  Let  r be  defined  as  above.  Civc-i  the  vector 

_ " 1 V 


r * b - Ax.  If  fircll2  > llr||2,  then 


ilxll. 


«rcll2-ilril2 


Proof.  Let  z * V x and  let 


x,  let 
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T *T  T 

where  c is  an  n-vector.  Then  if  we  partition  z = (z  ,z  ) and 

C 6 

T aT  T 

c * (c  ,c  ) conformally  with  the  previous  partitions  of  U,  V,  and 
X,  we  have 


l!r|||  3 ||UT(b-AWTx)||* 


2 

2 


Hc-Zz||~  ♦ 


2 

2 


»VVe'5  + 


wA  * «r 


Consequently 

(4.2)  llrllf  » lice-Sc£clt|  * lidll*. 

T 

Now  the  vector  y * V x.  is  given  by 

o C 


y 


c 


so  that 

(4.5)  i|rcl||  * llcjlj  * l!d|||. 

From  (4.2) 


W2-.m‘2  > e fi£c!l2  - »2tl!2l!5cil2 


5 ll£t!l2  - e||zt!!2. 
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Hence 


e 


> 


and  from  (4.5) 


llxli2  > 


/flrjlj-lldlll  - /[rn|-  Kdjif 


. I|ri<2  • "r»2 


'Hie  theorem  shows  that  even  a slight  decrease  in  the  residual  must 
result  in  a great  increase  in  the  size  of  the  solution.  It  is  hardly 
necessary  to  add  that  a large  solution  is  seldom  acceptable  in  practice: 
it  must  have  high  variance,  and  it  may  be  physically  meaningless. 

The  results  of  this  section  have  implications  for  a common  practice 
in  data  analysis,  namely  that  of  fitting  a large  nunber  of  subsets  of  the 
colunns  of  A in  an  attempt  to  obtain  a good  fit  with  fewer  than  the  full 
complement  of  coluans  (for  example,  see  [6]).  We  have,  in  effect,  shown 
that  if  the  ratio  e/6  is  reasonable,  this  procedure  is  not  likely  to  be 
very  productive.  Any  set  of  r independent  columns  will  give  about  the 
same  residual,  and  any  larger  set  that  significantly  reduces  the  residual 
must  produce  an  unacceptably  large  solution.  There  are,  however,  two  cases 
where  this  procedure  might  be  of  some  help.  First  when  it  is  hoped  that 
fewer  than  r colums  can  produce  a good  fit,  and  second  when  the  e-6 
ration  is  not  very  small.  An  approach  to  the  second  problem  that  uses  the 
singular  value  decomposition  of  the  augmented  matrix  (A,b)  is  described 
in  [9]  and  (16,17). 
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5.  Extraction  of  Independent  Colunn.s:  the  QR  Factorization 

We  now  turn  to  the  problem  of  extracting  a set  of  numerically  inde- 
pendent colunns.  The  first  metlri  we  shall  consider  is  based  on  the  QR 
factorization  of  the  matrix  A.  Specifically,  if  A is  an  m * n matrix 
with  m d n,  then  A can  be  written  in  the  form 


A » QR, 


T 

where  Q has  orthonormal  columns  (Q  Q*I)  and  R is  upper  triangular. 

If  A has  full  colunn  rank,  then  the  factorization  is  unique  up  to  the 
signs  of  the  colunns  of  Q and  the  corresponding  rows  of  R.  It  should 
be  noted  that  the  columns  of  Q form  an  orthonoimal  basis  for  R(A) . 

A knowledge  of  the  QR  factorization  of  A enables  one  to  solve  the 
least  squares  problem  (4.1).  Specifically,  any  solution  x of  (4.1) 
must  satisfy  the  equation 

Rx  * QTb, 

which  can  be  easily  solved  since  R is  upper  triangular.  Moreover,  since 
T T 

A A * R R,  we  have 


(aV1 


r-Vt 


so  that  one  can  use  the  matrix  R in  the  factorization  to  estimate  the 
covariance  matrix  of  the  solution. 

An  especially  desirable  feature  of  the  QR  factorization  is  that  it 
can  he  used  to  solve  a truncated  least  squares  problem  in  which  only  an 
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i r 

initial  set  of  colunns  arc  fit.  If  A denotes  the  matrix  consisting 

of  the  first  r colurais  of  A and  R denotes  the  leading  principal 

aubmatrix  of  order  r of  A,  then 

(5.1)  A,r  = Q,rR^. 

Since  R,r  is  upper  triangular  and  Q*r  has  orthonormal  columns, 
equation  (5.1)  gives  the  QR  factorization  of  A,r  and  can  be  used  as 
described  above  to  solve  least  squares  problems  involving  A*r. 

The  basis  for  using  the  QR  factorization  to  extract  a linearly 
independent  set  of  columns  from  the  matrix  A is  contained  in  the 
following  theorem. 

Theorem  5.1.  Let  the  QR  factorization  of  A be  partitioned  in  the 

form 

(Aj,a2)  » (Q.,Q,)(Ru  11,2 

1 V>  r22 

where  Aj.Qj  € and  Rn  S Rrxr.  If 

HR22®  2 " 1 K 6 * inf(R|jj'. 

then  A has  rank  (rt,ctr)2.  Moreover, 

inf(Ajj)  * 5. 

Proof.  Because  the  colunns  of  Q are  or  sonormal,  the  singular 
values  of  A and  of  R are  the  same.  Now  5 is  the  r-th  sii^ular 
value  of  Rjj,  and  hence  by  (2.9)  6 is  less  than  or  equal  to  the  r-th 
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singular  value  of  A;  i.e.  or  > 6.  Likewise  from  (2.10),  e > ar+1. 

Thus  A has  rank  (6, e,r).  Moreover,  since  Q^  has  orthonormal  columns, 

inf (Aj)  = inffQjR^)  = inffR^)  * 6.o 

The  application  of  this  theorem  is  obvious.  If,  after  having 
computed  the  QR  factorization  of  A,  we  encounter  a small  matrix  R22 
and  a matrix  R^  with  a suitably  large  infinum,  then  the  columns  of 
Aj  span  a good  approximation  tc  the  r-.section  of  A.  Because 
of  (5.1),  we  have  at  hand  the  QR  factorization  of  Aj  and  can  proceed 
immediately  to  the  solution  of  least  squares  problems  involving  Aj. 

There  remain  two  problems.  First  how  can  one  insure  that  the  first  r 
columns  of  A are  linearly  independent,  and  second  how  can  one  estimate 
inf(Ru)? 

The  solution  to  the  first  problem  depends  on  the  method  by  which  the 
QR  factorization  is  confuted.  Probably  the  best  numerical  algorithm  is 
one  based  on  Householder  transformations  in  which  the  QR  factorizations 

Alk  = QlkRn;  arc  computed  successively  for  k = l,2,...,n  (e.g.  sec  1 14 } ) . 

Ik  Ik 

At  the  k-th  step,  just  before  Q and  R are  computed,  there  is#the 
possibility  of  rep  acing  the  k-th  column  of  A by  one  of  the  columns 
Vrak+2’,,,’V  **  t*ie  column  that  maximizes  the  (k,k)  - element  of 

R is  chosen  to  replace  aj.,  then  there  will  be  a tendency  for  indepen- 
dent columns  to  be  processed  first,  leaving  the  dependent  columns  at  the 
end  of  the  matrix.  An  ALGOL  program  incorpo: -wing  this  "column  pivoting" 
is  given  in  [3]  and  a FORTRAN  program  is  given  in  [11]. 
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JWi , w sUf'J J nu - 


i 

Once  a satisfactory  QR  decomposition  has  been  calculated,  we  can 
estimate  IIR22^2  by  the  1,01,1x1 


■^22^2  * *^22^1  ^22^ 


» 


where 

\ II X|L  = max  2 |x..  | 

j i J 

and 


ilX||  * max  2 |x..|. 

i j ij 

Likewise  one  can  estimate  inf(Rjj)  by  coniiuting  Rjj  (an  easy  task 
since  Rn  is  upper  triangular)  and  using  the  relations 


inf(Rjj) 


mlW ? 


The  procedure  sketched  above  is  completely  reliable  in  the  sense 
that  it  cannot  fool  one  into  thinking  a set  of  dependent  columns  are 
independent.  However,  it  can  fail  to  obtain  a set  of  linearly  indepen- 
dent columns,  as  the  following  example  shows. 

lix ample  5.2.  Let  An  be  the  matrix  of  order  n • illustrated  below 

for  n = 5: 


f 

[ 
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1 

-i  fSi 

-i/J5 

-lM 

-i/*S 

0 

U7L 

-i/A 

-l/A 

-i/JS 

0 

0 

1/.7 

-lM 

-l/JS 

0 

0 

0 

l/A 

-l//? 

0 

0 

0 

0 

1//5 

Letting  xj  * (1,^/2, ^f/4,/T/8,...,»€/2n'1),  it  is  easily  verified  that 

Ax  * 2"ne 
n n 

where  e1  * (1,1,..., 1).  Thus  has  the  approximate  null  vector  xR 
and  must  have  nearly  dependent  columns.  However,  computing  the  QR  factori- 
zation of  AJ4,  even  with  column  pivoting,  leaves  Ar  undisturbed.  Since 
no  element  of  An  is  very  small,  we  shall  have  R22  void;  i.e.  no  depen- 
dent colunn  will  be  found. 

It  should  be  observed  that  in  the  above  example  there  is  no  danger 
of  the  degeneracy  in  A^  going  undetected.  Since  R22  is  void,  R^j  * Afl 
and  any  attempt  to  estimate  inf(R^)  will  reveal  the  degeneracy. 

It  may  be  objected  that  the  matrix  An  in  Example  5.2  shows  an 

-1/2 

obvious  sign  of  degeneracy;  viz.  its  determinant  (n!)  goes  rapidly 
to  zero  with  increasing  n.  However,  the  matrix  |An(,  obtained  from  AR 
by  taking  the  absolute  value  of  its  elements,  has  the  same  determinant 
yet  its  columns  are  strongly  independent.  Thus  the  example  confirms  a 
fact  well  known  to  practical  computers:  the  value  of  a determinant  is 
worthless  as  an  indication  of  singularity. 
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6.  lixtraction  of  Independent  Colunns:  the  Singular  Value  Decomposition 
When  the  singular  value  decomposition  of  A has  been  confuted  (an 
ALGOL  program  is  given  in  [8]  and  a FORTRAN  program  in  [11]),  a different 
way  of  selecting  independent  columns  is  available.  The  method  is  based  on 
the  following  'heorem. 

Theorem  6.1.  Let  A have  the  singular  value  decomposition 

uTw-(o)- 

ixjt  V be  partitioned  in  the  form 


where  is  r * r,  and  let  A be  partitioned  in  the  form 

A * (Aj * 

where  Aj  has  r colunns.  Let  6 * ar , e * and 

r * 6inf(Vel). 

Then  A has  miner ical  rank  (6,F.,r)2  and 

(6. I ) inf(Aj)  > y. 

Proof . The  fact  that  A has  numerical  rank  (6,e,r)2  follows 
immediately  from  Theorem  2.3.  To  establish  (6.1),  observe  that  if  we 
write 
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AV  ® S = (Sj,S2) 


T T 

where  has  r columns,  then  S^S2  * 0.  Now  since  A « SV  , we  have 

*1  ’ SlVIl  * S2*J  • 


Since  S^S2  * 0, 

inf(Aj)  > infCS^)  > inf (Sj)  inf (V^) 

* ar  inf(V  j)  » y.o 

As  with  the  QR  factorization.  Theorem  6.J  provides  us  with  a way  of 
determining  when  an  initial  set  of  r columns  of  A are  independent. 
Since  an  initial  set  may  be  degenerate,  we  must  adopt  some  kind  of  inter- 
change strategy  to  bring  an  independent  set  of  colimns  into  the  initial 
positions.  If  P is  any  permutation  matrix,  then 


UT(AP) (PTV) 


0- 


so  that  in  the  singular  value  decomposition  an  interchange  of  columns  of 
A corresponds  to  an  interchange  of  the  corresponding  rows  of  V.  This 
suggests  that  we  exchange  rows  of  V until  inf(Vfj)  becomes  acceptably 

large.  One  way  of  accomplishing  this  is  to  start  with  the  r * n matrix 

• 

VT  * (VT  VT  ) 

V1  { tl’c2J 


and  compute  its  QR  factorization  with  colunn  pivoting  to  force  a set 
of  independent  columns  into  the  first  r positions.  Alternative’)  one 
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could  apply  an  algorithm  such  as  Gaussian  elimination  with  complete 
T 

pivoting  to  Vj  (c.g.  see  [14]). 

If  either  of  the  above  suggestions  is  followed,  the  final  matrix 
T 

Vfj  will  be  upper  triangular,  and  its  infimum  can  be  bounded  by  the  method 
suggested  in  the  last  section. 

If  r is  small,  significant  savings  can  be  obtained  by  observing 
that  the  singular  values  in  [0,1)  of  Vfil  and  \ ^ are  the  same  (see 
the  appendix  of  [15]  for  a proof).  Thus  one  can  start  with  the  smaller 
matrix 

C6.2)  V*  . C?y*2) 

and  use  the  QR  factorization  with  column  pivoting  to  determine  the 
dependent  columns  of  A.  Note  that  when  r * n-1  the  column  to  be  stricken 
corresponds  to  the  largest  clement  of  the  row  vector  v£. 

The  question  of  whether  to  use  the  QR  factorization  or  the  singular 
value  decomposition  is  primarily  one  of  computational  efficiency.  Although 
Example  5.2  shows  that  the  QR  factorization  can  fail  to  isolate  a set  of 
independent  coluims  in  a case  where  the  singular  value  decomposition  does, 
this  is  an  unusual  phenomenon  (see  Example  7.2)  and  in  most  cases  the  QR 
factorization  with  colimn  pivoting  is  effective  in  locating  independent 
columns.  When  m is  not  too  much  greater  than  n,  the  calculation  of  the 
singular  value  decomposition  is  considerably  more  expensive  than  the 
calculation  of  the  QR  factorization,  and  it  is  more  efficient  to  stick  with 
the  latter,  if  possible. 
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When  m » n,  we  can  begin  by  confuting  the  QR  factorization  of  A. 

The  matrix  R has  the  same  singular  values  as  A,  and  indeed  if 

(6.3)  UTRV  = I 

is  the  singular  value  decomposition  of  R,  then  V is  the  matrix  of 
right  singular  vectors  of  A.  Since  R is  an  n x n matrix,  the  reduc- 
tion (6.3)  is  computationally  far  less  expensive  than  che  initial  com- 
putation of  R,  and  there  seems  to  be  no  reason  not  to  use  the  singular 
value  decomposition. 

7.  Examples 

In  this  section  we  shall  give  some  examples  illustrating  the  pre- 
ceding material.  The  numerical  computation.;  were  done  in  double  precision 
on  an  IBM  360;  i.e.  to  about  sixteen  decimal  digits. 

Example  7.1.  This  example  has  been  deliberately  chosen  to  be  un- 
complicated. For  fixed  n,  let 

H * I - | eeT, 
n n 

T 

where  e * (1,’,...,1).  It  is  easily  verified  that  Hr  is  orthogonal. 

Let 


1 « diag(l,l, 1,1,1, 0,0,0, 0,0) 
A * HS0  ^ o ) Hl0‘ 


and 
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Then  A has  five  nonzero  singular  values  equal  to  unity  and  five  zero 
singular  values,  and  thus  it  should  have  five  linearly  independent 
colunns. 

The  singular  values  of  A were  computed  to  lie  1,1, 1,1,1, .35x10 

0,0, 0,0,  so  that  A can  be  regarded  as  having  rank  (l,e,5)  where 

e = 10  The  pivoting  strategy  described  in  Section  (>  was  used  to 

isolate  a set  of  five  linearly  independent  colunns.  These  turned  out  to 

be  columns  1,2, 4, 5,  and  9.  The  associated  matrix  Vj  had  an  infimum 

of  .45  which  is  very  close  to  the  optimal  value  of  unity.  As  a final 

check,  we  compute  ||Py  'PAW1!»  where  W * (ej^.e^e^eg)  is  the  matrix 

c 

that  selects  the  independent  colunns  from  A (cf.  Theorem  3.1),  The 
result  is 


l|PU  -pAW!l2  * *37  x 10’14» 


which  shows  that  colunns  1,2, 4, 5,  and  9 of  the  matrix  A almost  exactly 
span  the  r- sect ion  of  A. 

The  QR  factorization  with  column  pivoting  that  is  described  in  Sec- 
tion 5 was  also  applied  to  A.  The  pivot  colunns  and  their  norms  were 

5 
4 
2 
3 

6 
1 

7 

8 
9 

10 


.89 

.86 

.81 

.71 

.44 

.45 

.13 

0 

0 

0 


X 10 
X 10 


-16 

-30 
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If  the  gap  is  taken  to  lie  after  the  fifth  vector  we  have 

infOy  - 1,  IIR^ li2  * .20  x 10'16. 

Thus  the  QR  factorization  exhibits  the  same  sharp  gap  as  the  singular 
value  decomposition.  However,  the  five  columns  2, 3, 4, 5,  and  6 desig- 
nated as  independent  are  different  from  those  chosen  by  means  of  the 
singular  value  decomposition.  Nonetheless,  for  W * (e2,e3,e4,e5,ef) 
we  have 

!lpu  "pak!i  * *37  x 10"14» 

e 

so  that  this  choice  of  columns  is  as  good  as  the  one  predicted  by  the 
singular  value  decomposition. 

Incidentally  the  estimate  of  IIR22II2  1,5 inR  the  1“  and  —norms  i- 

/llR22lli,!R2211-  * '94  x 10'16» 

which  is  not  a gross  overestimate. 

Example  7.2.  This  is  the  matrix  A25  of  Example  5.2.  The  singular 
values  of  this  matrix  are 

^ * 6, • . • ,^24* • ^1 >^25*  * ^ * 10  . 

Again  there  is  a well  defined  gap,  and  we  may  take  A to  have  rank 
(.3l,e,24)  where  f * 10  . This  time  there  is  only  a single  dependent 
vector  which *can  be  found  by  looking  for  the  largest  corjwnent  of  the 
right  singular  vector  v2S  corresponding  to  o25  (cf.  the  conroents  at 
equation  (6.2)).  This  component,  .75,  is  the  first,  which  indicates 


- 32  - 


that  column  one  should  be  discarded,  for  this  selection  we  have 

!,pu  * *49  * 10  ?- 

c 

In  principle,  the  QR  factorization  should  fail  to  isolate  a depen- 
dent colurm  of  However,  because  the  elements  of  A25  were 

entered  with  rounding  error,  the  pivot  order  with  coition  norms  turned 
out  to  be 


1 1 
25 
6 


24 
2 

This  again  gives  a well  defined  gap  and  indicates  that  column  2 should 
be  thrown  out  (the  second  component  of  V25  is  .53  so  that  also  from  the 
point  of  view  of  the  singular  value  decomposition  the  second  colum  is 
a candidate  for  rejection).  For  this  subspace  we  have 

,|pu  'paw!I  * -11  x 10’6- 

e 

Thus  the  QR  factorization  gives  only  slightly  worse  results  than  the 
singular  value  decomposition,  in  spite  of  the  fact  that  the  example  was 
concocted  to  make  the  QR  decomposition  fail.  * 


98 

88 

37  ift 
15  x in'16 


iixample  7.3.  To  show  that  our  theory  may  be  of  some  use  even  where 
there  is  not  a sharply  defined  gap  in  the  singular  values,  we  consider  the 
Longley  test  data  [12],  which  has  frequently  been  cited  in  the  literature. 
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Since  it  is  * coranon  practice  to  subtract  means  from  raw  data,  we  have 
included  a colimn  of  ones  in  the  model.  Specifically  the  colunns  of 
A are  as  follows: 

1 --  ones 

2 --  QfP  Implicit  Price  Deflator,  1954  - 100 

3 - QfP 

• 4 --  Unemployment 

5 --  Size  of  armed  forces 

6 — Noninstitutional  population  > 14  years  old 

7 --  Time  (years) 

The  scaling  of  this  data  will  critically  affect  our  results.  For  the 
purposes  of  this  experiment  we  assume  that  colunns  two  through  six  arc 
known  to  about  three  significant  figures.  Accordingly  each  of  these 
colunns  was  multiplied  by  a factor  that  made  its  mean  equal  to  500. 

The  colimn  of  ones  is  known  exactly  and  by  the  equal  error  scaling 
criterion  ought  to  be  scaled  by  a factor  of  infinity.  As  an  approxima- 
tion we  took  the  scaling  factor  to  be  1010. 

The  colunn  of  years  can  be  treated  in  twc  ways.  First  the  errors  in 
tlie  time  of  measurement  can  be  attributed  to  the  column  itself,  which 
would  result  in  the  colimn  being  assigned  a low  accuracy.  However,  we 
observe  that  any  constant  bias  in  the  time  of  measurement  is  accounted 
for  by  the  colunn  of  ones,  and  any  other  errors  can  be  attributed  to  the 
measured  data.  Consequently  we  have  preferred  to  regard  the  years  as 
known  exactly  and  scale  the  seventh  colunn  by  101®. 


The  singular  values  of  the  matrix  thus  scaled  arc 
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.78  x 1014 
.94  x 108 
.58  x 103 
.26  x I03 
.26  x 102 
.22  x 102 

.51  x IQ1 


Since  the  error  in  A is  of  order  unity,  ;he  last  singular  value  must 
he  regarded  as  pure  noise,  and  we  may  take  A 'o  have  rank  (22,5. 1,6^2* 
The  largest  component  of  the  seventh  singular  vector  is  the  sixth  and  has 
a value  of  .90.  When  the  sixth  column  is  removed  from  the  matrix,  the 
resulting  subspace  compares  with  U,.  ^ as  follows: 


IIP, 


U 


-P 


5.1 


AW11 


.12. 


The  relatively  poor  determination  of  the  5 -section  of  A suggests 
tliat  not  much  useful  information  can  be  obtaines  from  a least  squares 
fit,  even  when  the  sixth  column  is  ignored.  Th.  next  gap  that  presents 
itself  is  between  the  fourth  and  fifth  singular  values.  If  we  regard  A 
as  having  rank  (260,26,4)2  and  use  the  pivoting  strategy  of  Section  6 to 
isolate  a set  of  four  independent  columns,  we  choose  columns  1,4,5,  and  7 
with 


inf(Vei)  * .991. 

For  this  choice  of  columns 

llP»J  .p  I!  m 0 Oil 
U26G  *AK  U.Ull, 
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a far  more  satisfactory  result. 

If  the  QR  factorization  is  applied  to  A,  there  results  the  follow- 
ing sequence  of  pivot  columns  and  norms: 

7 .78  x 1014 

1 .94  x 108 

5 .47  x 103 

4 .31  x 103 

2 .24  x 102 

3 .21  x 102 

6 .57  x 101 

This  agrees  completely  with  the  results  from  the  singular  value  decomposi- 
tion. Either  one  or  three  colimns  should  be  discarded,  and  colunns  6,  2, 
and  3,  in  that  order,  are  candidates. 

Although  these  results  indicate  that  colimns  2,  3,  and  6 should  be 
discarded  from  the  model,  they  are  not  conclusive,  since  there  may  be 
other  sets  containing  some  of  these  colunns  that  give  a satisfactory 
approximation  to  the  260-section  of*  A.  However,  a singular  value  decompos- 
ition of  the  matrix  consisting  of  colunns  1,2, 3, 6,  and  7 gives  the  singular 
values 

.78  x lo14 
.9^  x 108 
.50  x 102 
.25  x 102 

.10  x 102 

which  shows  that  none  of  these  colunns  is  a really  good  candidate  for  inclu- 
sion in  the  model. 


- 36  - 


To  siot  up:  if  the  raw  long lev  data  is  taken  to  bo  accurate  to  three 

significant  figures,  if  years  arc  assimed  to  be  exact,  and  if  means  are 
subtracted  from  the  colunns,  then  the  colurtn  corresponding  to  noninstitu- 
tional  population  is  redundant,  and  the  columns  corresponding  to  the  GNP 
implicit  price  deflator  and  Lite  GNP  are  so  nearly  redundant  that  their 
inclusion  in  the  model  will  affect  the  stability  of  the  residuals  from  any 
regressions. 
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